Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
1.
Sci Data ; 11(1): 363, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605048

RESUMO

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.


Assuntos
Disciplinas das Ciências Biológicas , Bases de Conhecimento , Reconhecimento Automatizado de Padrão , Algoritmos , Pesquisa Translacional Biomédica
2.
Am J Hum Genet ; 111(1): 11-23, 2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38181729

RESUMO

Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.


Assuntos
Sistema de Aprendizagem em Saúde , Medicina de Precisão , Humanos , Bancos de Espécimes Biológicos , Colorado , Genômica
3.
medRxiv ; 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-38045364

RESUMO

Objective: The Multi-State EHR-Based Network for Disease Surveillance (MENDS) is a population-based chronic disease surveillance distributed data network that uses institution-specific extraction-transformation-load (ETL) routines. MENDS-on-FHIR examined using Health Language Seven's Fast Healthcare Interoperability Resources (HL7® FHIR®) and US Core Implementation Guide (US Core IG) compliant resources derived from the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to create a standards-based ETL pipeline. Materials and Methods: The input data source was a research data warehouse containing clinical and administrative data in OMOP CDM Version 5.3 format. OMOP-to-FHIR transformations, using a unique JavaScript Object Notation (JSON)-to-JSON transformation language called Whistle, created FHIR R4 V4.0.1/US Core IG V4.0.0 conformant resources that were stored in a local FHIR server. A REST-based Bulk FHIR $export request extracted FHIR resources to populate a local MENDS database. Results: Eleven OMOP tables were used to create 10 FHIR/US Core compliant resource types. A total of 1.13 trillion resources were extracted and inserted into the MENDS repository. A very low rate of non-compliant resources was observed. Discussion: OMOP-to-FHIR transformation results passed validation with less than a 1% non-compliance rate. These standards-compliant FHIR resources provided standardized data elements required by the MENDS surveillance use case. The Bulk FHIR application programming interface (API) enabled population-level data exchange using interoperable FHIR resources. The OMOP-to-FHIR transformation pipeline creates a FHIR interface for accessing OMOP data. Conclusion: MENDS-on-FHIR successfully replaced custom ETL with standards-based interoperable FHIR resources using Bulk FHIR. The OMOP-to-FHIR transformations provide an alternative mechanism for sharing OMOP data.

4.
Appl Clin Inform ; 14(5): 822-832, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37852249

RESUMO

OBJECTIVES: In a randomized controlled trial, we found that applying implementation science (IS) methods and best practices in clinical decision support (CDS) design to create a locally customized, "enhanced" CDS significantly improved evidence-based prescribing of ß blockers (BB) for heart failure compared with an unmodified commercially available CDS. At trial conclusion, the enhanced CDS was expanded to all sites. The purpose of this study was to evaluate the real-world sustained effect of the enhanced CDS compared with the commercial CDS. METHODS: In this natural experiment of 28 primary care clinics, we compared clinics exposed to the commercial CDS (preperiod) to clinics exposed to the enhanced CDS (both periods). The primary effectiveness outcome was the proportion of alerts resulting in a BB prescription. Secondary outcomes included patient reach and clinician adoption (dismissals). RESULTS: There were 367 alerts for 183 unique patients and 171 unique clinicians (pre: March 2019-August 2019; post: October 2019-March 2020). The enhanced CDS increased prescribing by 26.1% compared with the commercial (95% confidence interval [CI]: 17.0-35.1%), which is consistent with the 24% increase in the previous study. The odds of adopting the enhanced CDS was 81% compared with 29% with the commercial (odds ratio: 4.17, 95% CI: 1.96-8.85). The enhanced CDS adoption and effectiveness rates were 62 and 14% in the preperiod and 92 and 10% in the postperiod. CONCLUSION: Applying IS methods with CDS best practices was associated with improved and sustained clinician adoption and effectiveness compared with a commercially available CDS tool.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Insuficiência Cardíaca , Humanos , Insuficiência Cardíaca/tratamento farmacológico , Ciência da Implementação
5.
NPJ Digit Med ; 6(1): 89, 2023 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-37208468

RESUMO

Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

6.
EClinicalMedicine ; 58: 101932, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37034358

RESUMO

Background: Adverse events of special interest (AESIs) were pre-specified to be monitored for the COVID-19 vaccines. Some AESIs are not only associated with the vaccines, but with COVID-19. Our aim was to characterise the incidence rates of AESIs following SARS-CoV-2 infection in patients and compare these to historical rates in the general population. Methods: A multi-national cohort study with data from primary care, electronic health records, and insurance claims mapped to a common data model. This study's evidence was collected between Jan 1, 2017 and the conclusion of each database (which ranged from Jul 2020 to May 2022). The 16 pre-specified prevalent AESIs were: acute myocardial infarction, anaphylaxis, appendicitis, Bell's palsy, deep vein thrombosis, disseminated intravascular coagulation, encephalomyelitis, Guillain- Barré syndrome, haemorrhagic stroke, non-haemorrhagic stroke, immune thrombocytopenia, myocarditis/pericarditis, narcolepsy, pulmonary embolism, transverse myelitis, and thrombosis with thrombocytopenia. Age-sex standardised incidence rate ratios (SIR) were estimated to compare post-COVID-19 to pre-pandemic rates in each of the databases. Findings: Substantial heterogeneity by age was seen for AESI rates, with some clearly increasing with age but others following the opposite trend. Similarly, differences were also observed across databases for same health outcome and age-sex strata. All studied AESIs appeared consistently more common in the post-COVID-19 compared to the historical cohorts, with related meta-analytic SIRs ranging from 1.32 (1.05 to 1.66) for narcolepsy to 11.70 (10.10 to 13.70) for pulmonary embolism. Interpretation: Our findings suggest all AESIs are more common after COVID-19 than in the general population. Thromboembolic events were particularly common, and over 10-fold more so. More research is needed to contextualise post-COVID-19 complications in the longer term. Funding: None.

7.
JAMIA Open ; 5(3): ooac071, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35936991

RESUMO

Objectives: Manual record review is a crucial step for electronic health record (EHR)-based research, but it has poor workflows and is error prone. We sought to build a tool that provides a unified environment for data review and chart abstraction data entry. Materials and Methods: ReviewR is an open-source R Shiny application that can be deployed on a single machine or made available to multiple users. It supports multiple data models and database systems, and integrates with the REDCap API for storing abstraction results. Results: We describe 2 real-world uses and extensions of ReviewR. Since its release in April 2021 as a package on CRAN it has been downloaded 2204 times. Discussion and Conclusion: ReviewR provides an easily accessible review interface for clinical data warehouses. Its modular, extensible, and open source nature afford future expansion by other researchers.

8.
Learn Health Syst ; 6(3): e10297, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35860322

RESUMO

Introduction: Learning health systems can help estimate chronic disease prevalence through distributed data networks (DDNs). Concerns remain about bias introduced to DDN prevalence estimates when individuals seeking care across systems are counted multiple times. This paper describes a process to deduplicate individuals for DDN prevalence estimates. Methods: We operationalized a two-step deduplication process, leveraging health information exchange (HIE)-assigned network identifiers, within the Colorado Health Observation Regional Data Service (CHORDS) DDN. We generated prevalence estimates for type 1 and type 2 diabetes among pediatric patients (0-17 years) with at least one 2017 encounter in one of two geographically-proximate DDN partners. We assessed the extent of cross-system duplication and its effect on prevalence estimates. Results: We identified 218 437 unique pediatric patients seen across systems during 2017, including 7628 (3.5%) seen in both. We found no measurable difference in prevalence after deduplication. The number of cases we identified differed slightly by data reconciliation strategy. Concordance of linked patients' demographic attributes varied by attribute. Conclusions: We implemented an HIE-dependent, extensible process that deduplicates individuals for less biased prevalence estimates in a DDN. Our null pilot findings have limited generalizability. Overlap was small and likely insufficient to influence prevalence estimates. Other factors, including the number and size of partners, the matching algorithm, and the electronic phenotype may influence the degree of deduplication bias. Additional use cases may help improve understanding of duplication bias and reveal other principles and insights. This study informed how DDNs could support learning health systems' response to public health challenges and improve regional health.

9.
Lancet Digit Health ; 4(7): e532-e541, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35589549

RESUMO

BACKGROUND: Post-acute sequelae of SARS-CoV-2 infection, known as long COVID, have severely affected recovery from the COVID-19 pandemic for patients and society alike. Long COVID is characterised by evolving, heterogeneous symptoms, making it challenging to derive an unambiguous definition. Studies of electronic health records are a crucial element of the US National Institutes of Health's RECOVER Initiative, which is addressing the urgent need to understand long COVID, identify treatments, and accurately identify who has it-the latter is the aim of this study. METHODS: Using the National COVID Cohort Collaborative's (N3C) electronic health record repository, we developed XGBoost machine learning models to identify potential patients with long COVID. We defined our base population (n=1 793 604) as any non-deceased adult patient (age ≥18 years) with either an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) from an inpatient or emergency visit, or a positive SARS-CoV-2 PCR or antigen test, and for whom at least 90 days have passed since COVID-19 index date. We examined demographics, health-care utilisation, diagnoses, and medications for 97 995 adults with COVID-19. We used data on these features and 597 patients from a long COVID clinic to train three machine learning models to identify potential long COVID among all patients with COVID-19, patients hospitalised with COVID-19, and patients who had COVID-19 but were not hospitalised. Feature importance was determined via Shapley values. We further validated the models on data from a fourth site. FINDINGS: Our models identified, with high accuracy, patients who potentially have long COVID, achieving areas under the receiver operator characteristic curve of 0·92 (all patients), 0·90 (hospitalised), and 0·85 (non-hospitalised). Important features, as defined by Shapley values, include rate of health-care utilisation, patient age, dyspnoea, and other diagnosis and medication information available within the electronic health record. INTERPRETATION: Patients identified by our models as potentially having long COVID can be interpreted as patients warranting care at a specialty clinic for long COVID, which is an essential proxy for long COVID diagnosis as its definition continues to evolve. We also achieve the urgent goal of identifying potential long COVID in patients for clinical trials. As more data sources are identified, our models can be retrained and tuned based on the needs of individual studies. FUNDING: US National Institutes of Health and National Center for Advancing Translational Sciences through the RECOVER Initiative.


Assuntos
COVID-19 , Adolescente , Adulto , COVID-19/complicações , COVID-19/diagnóstico , COVID-19/epidemiologia , Teste para COVID-19 , Humanos , Aprendizado de Máquina , Pandemias , SARS-CoV-2 , Estados Unidos/epidemiologia , Síndrome Pós-COVID-19 Aguda
10.
AMIA Annu Symp Proc ; 2022: 319-328, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128436

RESUMO

Patient representation learning methods create rich representations of complex data and have potential to further advance the development of computational phenotypes (CP). Currently, these methods are either applied to small predefined concept sets or all available patient data, limiting the potential for novel discovery and reducing the explainability of the resulting representations. We report on an extensive, data-driven characterization of the utility of patient representation learning methods for the purpose of CP development or automatization. We conducted ablation studies to examine the impact of patient representations, built using data from different combinations of data types and sampling windows on rare disease classification. We demonstrated that the data type and sampling window directly impact classification and clustering performance, and these results differ by rare disease group. Our results, although preliminary, exemplify the importance of and need for data-driven characterization in patient representation-based CP development pipelines.


Assuntos
Aprendizado de Máquina , Doenças Raras , Humanos , Fenótipo
11.
J Am Med Inform Assoc ; 29(4): 592-600, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-34919694

RESUMO

OBJECTIVE: Clinical research data warehouses (RDWs) linked to genomic pipelines and open data archives are being created to support innovative, complex data-driven discoveries. The computing and storage needs of these research environments may quickly exceed the capacity of on-premises systems. New RDWs are migrating to cloud platforms for the scalability and flexibility needed to meet these challenges. We describe our experience in migrating a multi-institutional RDW to a public cloud. MATERIALS AND METHODS: This study is descriptive. Primary materials included internal and public presentations before and after the transition, analysis documents, and actual billing records. Findings were aggregated into topical categories. RESULTS: Eight categories of migration issues were identified. Unanticipated challenges included legacy system limitations; network, computing, and storage architectures that realize performance and cost benefits in the face of hyper-innovation, complex security reviews and approvals, and limited cloud consulting expertise. DISCUSSION: Cloud architectures enable previously unavailable capabilities, but numerous pitfalls can impede realizing the full benefits of a cloud environment. Rapid changes in cloud capabilities can quickly obsolete existing architectures and associated institutional policies. Touchpoints with on-premise networks and systems can add unforeseen complexity. Governance, resource management, and cost oversight are critical to allow rapid innovation while minimizing wasted resources and unnecessary costs. CONCLUSIONS: Migrating our RDW to the cloud has enabled capabilities and innovations that would not have been possible with an on-premises environment. Notwithstanding the challenges of managing cloud resources, the resulting RDW capabilities have been highly positive to our institution, research community, and partners.


Assuntos
Computação em Nuvem , Data Warehousing
12.
JMIR Mhealth Uhealth ; 9(12): e31618, 2021 12 23.
Artigo em Inglês | MEDLINE | ID: mdl-34941540

RESUMO

BACKGROUND: There is a growing interest in using person-generated wearable device data for biomedical research, but there are also concerns regarding the quality of data such as missing or incorrect data. This emphasizes the importance of assessing data quality before conducting research. In order to perform data quality assessments, it is essential to define what data quality means for person-generated wearable device data by identifying the data quality dimensions. OBJECTIVE: This study aims to identify data quality dimensions for person-generated wearable device data for research purposes. METHODS: This study was conducted in 3 phases: literature review, survey, and focus group discussion. The literature review was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guideline to identify factors affecting data quality and its associated data quality challenges. In addition, we conducted a survey to confirm and complement results from the literature review and to understand researchers' perceptions on data quality dimensions that were previously identified as dimensions for the secondary use of electronic health record (EHR) data. We sent the survey to researchers with experience in analyzing wearable device data. Focus group discussion sessions were conducted with domain experts to derive data quality dimensions for person-generated wearable device data. On the basis of the results from the literature review and survey, a facilitator proposed potential data quality dimensions relevant to person-generated wearable device data, and the domain experts accepted or rejected the suggested dimensions. RESULTS: In total, 19 studies were included in the literature review, and 3 major themes emerged: device- and technical-related, user-related, and data governance-related factors. The associated data quality problems were incomplete data, incorrect data, and heterogeneous data. A total of 20 respondents answered the survey. The major data quality challenges faced by researchers were completeness, accuracy, and plausibility. The importance ratings on data quality dimensions in an existing framework showed that the dimensions for secondary use of EHR data are applicable to person-generated wearable device data. There were 3 focus group sessions with domain experts in data quality and wearable device research. The experts concluded that intrinsic data quality features, such as conformance, completeness, and plausibility, and contextual and fitness-for-use data quality features, such as completeness (breadth and density) and temporal data granularity, are important data quality dimensions for assessing person-generated wearable device data for research purposes. CONCLUSIONS: In this study, intrinsic and contextual and fitness-for-use data quality dimensions for person-generated wearable device data were identified. The dimensions were adapted from data quality terminologies and frameworks for the secondary use of EHR data with a few modifications. Further research on how data quality can be assessed with respect to each dimension is needed.


Assuntos
Confiabilidade dos Dados , Dispositivos Eletrônicos Vestíveis , Registros Eletrônicos de Saúde , Humanos , Projetos de Pesquisa , Inquéritos e Questionários
13.
AMIA Jt Summits Transl Sci Proc ; 2021: 430-437, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457158

RESUMO

One of the challenges of teaching applied data science courses is managing individual students' local computing environment. This is especially challenging when teaching massively open online courses (MOOCs) where students come from across the globe and have a variety of access to and types of computing systems. There are additional challenges with using sensitive health information for clinical data science education. Here we describe the development and performance of a computing platform developed to support a series of MOOCs in clinical data science. This platform was designed to restrict and log all access to health datasets while also being scalable, accessible, secure, privacy preserving, and easy to access. Over the 19 months the platform has been live it has supported the computation of more than 2300 students from 101 countries.


Assuntos
Educação a Distância , Ciência de Dados , Humanos , Estudantes
14.
SN Comput Sci ; 2(4): 279, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34027432

RESUMO

Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-series data belonging to multiple entities. Thus, there may be multiple subsets of records such that records in each subset, which belong to a single entity are temporally dependent, but the records in different subsets are unrelated. Moreover, the records in a subset contain different types of attributes, some of which must be grouped in a particular manner to make the analysis meaningful. Anomaly detection techniques need to be customized for time-series data belonging to multiple entities. Second, anomaly detection techniques fail to explain the cause of outliers to the experts. This is critical for new diseases and pandemics where current knowledge is insufficient. We propose to address these issues by extending our existing work called IDEAL, which is an LSTM-autoencoder based approach for data quality testing of sequential records, and provides explanations of constraint violations in a manner that is understandable to end-users. The extension (1) uses a novel two-level reshaping technique that splits COVID-19 data sets into multiple temporally-dependent subsequences and (2) adds a data visualization plot to further explain the anomalies and evaluate the level of abnormality of subsequences detected by IDEAL. We performed two systematic evaluation studies for our anomalous subsequence detection. One study uses aggregate data, including the number of cases, deaths, recovered, and percentage of hospitalization rate, collected from a COVID tracking project, New York Times, and Johns Hopkins for the same time period. The other study uses COVID-19 patient medical records obtained from Anschutz Medical Center health data warehouse. The results are promising and indicate that our techniques can be used to detect anomalies in large volumes of real-world unlabeled data whose accuracy or validity is unknown.

15.
JMIR Med Inform ; 9(3): e24359, 2021 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-33749610

RESUMO

BACKGROUND: Limited consideration of clinical decision support (CDS) design best practices, such as a user-centered design, is often cited as a key barrier to CDS adoption and effectiveness. The application of CDS best practices is resource intensive; thus, institutions often rely on commercially available CDS tools that are created to meet the generalized needs of many institutions and are not user centered. Beyond resource availability, insufficient guidance on how to address key aspects of implementation, such as contextual factors, may also limit the application of CDS best practices. An implementation science (IS) framework could provide needed guidance and increase the reproducibility of CDS implementations. OBJECTIVE: This study aims to compare the effectiveness of an enhanced CDS tool informed by CDS best practices and an IS framework with a generic, commercially available CDS tool. METHODS: We conducted an explanatory sequential mixed methods study. An IS-enhanced and commercial CDS alert were compared in a cluster randomized trial across 28 primary care clinics. Both alerts aimed to improve beta-blocker prescribing for heart failure. The enhanced alert was informed by CDS best practices and the Practical, Robust, Implementation, and Sustainability Model (PRISM) IS framework, whereas the commercial alert followed vendor-supplied specifications. Following PRISM, the enhanced alert was informed by iterative, multilevel stakeholder input and the dynamic interactions of the internal and external environment. Outcomes aligned with PRISM's evaluation measures, including patient reach, clinician adoption, and changes in prescribing behavior. Clinicians exposed to each alert were interviewed to identify design features that might influence adoption. The interviews were analyzed using a thematic approach. RESULTS: Between March 15 and August 23, 2019, the enhanced alert fired for 61 patients (106 alerts, 87 clinicians) and the commercial alert fired for 26 patients (59 alerts, 31 clinicians). The adoption and effectiveness of the enhanced alert were significantly higher than those of the commercial alert (62% vs 29% alerts adopted, P<.001; 14% vs 0% changed prescribing, P=.006). Of the 21 clinicians interviewed, most stated that they preferred the enhanced alert. CONCLUSIONS: The results of this study suggest that applying CDS best practices with an IS framework to create CDS tools improves implementation success compared with a commercially available tool. TRIAL REGISTRATION: ClinicalTrials.gov NCT04028557; http://clinicaltrials.gov/ct2/show/NCT04028557.

16.
JMIR Mhealth Uhealth ; 9(3): e20738, 2021 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-33739294

RESUMO

BACKGROUND: There is increasing interest in reusing person-generated wearable device data for research purposes, which raises concerns about data quality. However, the amount of literature on data quality challenges, specifically those for person-generated wearable device data, is sparse. OBJECTIVE: This study aims to systematically review the literature on factors affecting the quality of person-generated wearable device data and their associated intrinsic data quality challenges for research. METHODS: The literature was searched in the PubMed, Association for Computing Machinery, Institute of Electrical and Electronics Engineers, and Google Scholar databases by using search terms related to wearable devices and data quality. By using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, studies were reviewed to identify factors affecting the quality of wearable device data. Studies were eligible if they included content on the data quality of wearable devices, such as fitness trackers and sleep monitors. Both research-grade and consumer-grade wearable devices were included in the review. Relevant content was annotated and iteratively categorized into semantically similar factors until a consensus was reached. If any data quality challenges were mentioned in the study, those contents were extracted and categorized as well. RESULTS: A total of 19 papers were included in this review. We identified three high-level factors that affect data quality-device- and technical-related factors, user-related factors, and data governance-related factors. Device- and technical-related factors include problems with hardware, software, and the connectivity of the device; user-related factors include device nonwear and user error; and data governance-related factors include a lack of standardization. The identified factors can potentially lead to intrinsic data quality challenges, such as incomplete, incorrect, and heterogeneous data. Although missing and incorrect data are widely known data quality challenges for wearable devices, the heterogeneity of data is another aspect of data quality that should be considered for wearable devices. Heterogeneity in wearable device data exists at three levels: heterogeneity in data generated by a single person using a single device (within-person heterogeneity); heterogeneity in data generated by multiple people who use the same brand, model, and version of a device (between-person heterogeneity); and heterogeneity in data generated from multiple people using different devices (between-person heterogeneity), which would apply especially to data collected under a bring-your-own-device policy. CONCLUSIONS: Our study identifies potential intrinsic data quality challenges that could occur when analyzing wearable device data for research and three major contributing factors for these challenges. As poor data quality can compromise the reliability and accuracy of research results, further investigation is needed on how to address the data quality challenges of wearable devices.


Assuntos
Dispositivos Eletrônicos Vestíveis , Monitores de Aptidão Física , Humanos , Reprodutibilidade dos Testes
17.
J Am Med Inform Assoc ; 28(7): 1591-1599, 2021 07 14.
Artigo em Inglês | MEDLINE | ID: mdl-33496785

RESUMO

OBJECTIVE: Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. MATERIALS AND METHODS: The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. RESULTS: The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. CONCLUSIONS: A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.


Assuntos
Confiabilidade dos Dados , Melhoria de Qualidade , Animais , Estágios do Ciclo de Vida
18.
Commun Med (Lond) ; 1(1): 42, 2021 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-36750622

RESUMO

BACKGROUND: Since the onset of the SARS-CoV-2 pandemic, most clinical testing has focused on RT-PCR1. Host epigenome manipulation post coronavirus infection2-4 suggests that DNA methylation signatures may differentiate patients with SARS-CoV-2 infection from uninfected individuals, and help predict COVID-19 disease severity, even at initial presentation. METHODS: We customized Illumina's Infinium MethylationEPIC array to enhance immune response detection and profiled peripheral blood samples from 164 COVID-19 patients with longitudinal measurements of disease severity and 296 patient controls. RESULTS: Epigenome-wide association analysis revealed 13,033 genome-wide significant methylation sites for case-vs-control status. Genes and pathways involved in interferon signaling and viral response were significantly enriched among differentially methylated sites. We observe highly significant associations at genes previously reported in genetic association studies (e.g. IRF7, OAS1). Using machine learning techniques, models built using sparse regression yielded highly predictive findings: cross-validated best fit AUC was 93.6% for case-vs-control status, and 79.1%, 80.8%, and 84.4% for hospitalization, ICU admission, and progression to death, respectively. CONCLUSIONS: In summary, the strong COVID-19-specific epigenetic signature in peripheral blood driven by key immune-related pathways related to infection status, disease severity, and clinical deterioration provides insights useful for diagnosis and prognosis of patients with viral infections.


Viral infections affect the body in many ways, including via changes to the epigenome, the sum of chemical modifications to an individual's collection of genes that affect gene activity. Here, we analyzed the epigenome in blood samples from people with and without COVID-19 to determine whether we could find changes consistent with SARS-CoV-2 infection. Using a combination of statistical and machine learning techniques, we identify markers of SARS-CoV-2 infection as well as of severity and progression of COVID-19 disease. These signals of disease progression were present from the initial blood draw when first walking into the hospital. Together, these approaches demonstrate the potential of measuring the epigenome for monitoring SARS-CoV-2 status and severity.

19.
Commun Med (Lond) ; 1(1): 42, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35072167

RESUMO

BACKGROUND: Since the onset of the SARS-CoV-2 pandemic, most clinical testing has focused on RT-PCR1. Host epigenome manipulation post coronavirus infection2-4 suggests that DNA methylation signatures may differentiate patients with SARS-CoV-2 infection from uninfected individuals, and help predict COVID-19 disease severity, even at initial presentation. METHODS: We customized Illumina's Infinium MethylationEPIC array to enhance immune response detection and profiled peripheral blood samples from 164 COVID-19 patients with longitudinal measurements of disease severity and 296 patient controls. RESULTS: Epigenome-wide association analysis revealed 13,033 genome-wide significant methylation sites for case-vs-control status. Genes and pathways involved in interferon signaling and viral response were significantly enriched among differentially methylated sites. We observe highly significant associations at genes previously reported in genetic association studies (e.g. IRF7, OAS1). Using machine learning techniques, models built using sparse regression yielded highly predictive findings: cross-validated best fit AUC was 93.6% for case-vs-control status, and 79.1%, 80.8%, and 84.4% for hospitalization, ICU admission, and progression to death, respectively. CONCLUSIONS: In summary, the strong COVID-19-specific epigenetic signature in peripheral blood driven by key immune-related pathways related to infection status, disease severity, and clinical deterioration provides insights useful for diagnosis and prognosis of patients with viral infections.

20.
Pediatr Diabetes ; 22(1): 31-39, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-32134536

RESUMO

OBJECTIVE: To compare treatment regimens and glycosylated hemoglobin (A1c) levels in Type 1 (T1D) and Type 2 diabetes (T2D) using diabetes registries from two countries-U.S. SEARCH for Diabetes in Youth (SEARCH) and Indian Registry of youth onset diabetes in India (YDR). METHODS: The SEARCH and YDR data were harmonized to the structure and terminology in the Observational Medical Outcomes Partnership Common Data Model. Data used were from T1D and T2D youth diagnosed <20 years between 2006-2012 for YDR, and 2006, 2008, and 2012 for SEARCH. We compared treatment regimens and A1c levels across the two registries. RESULTS: There were 4003 T1D (SEARCH = 1899; YDR = 2104) and 611 T2D (SEARCH = 384; YDR = 227) youth. The mean A1c was higher in YDR compared to SEARCH (T1D:11.0% ± 2.9% vs 7.8% ± 1.7%, P < .001; T2D:9.9% ± 2.8% vs 7.2% ± 2.1%, P < .001). Among T1D youth in SEARCH, 65.1% were on a basal/bolus regimen, whereas in YDR, 52.8% were on once/twice daily insulin regimen. Pumps were used by 16.2% of SEARCH and 1.5% of YDR youth with T1D. Among T2D youth, in SEARCH and YDR, a majority were on metformin only (43.0% vs 30.0%), followed by insulin + any oral hypoglycemic agents (26.3% vs 13.7%) and insulin only (12.8% vs 18.9%), respectively. CONCLUSION: We found significant differences between SEARCH and YDR in treatment patterns in T1D and T2D. A1c levels were higher in YDR than SEARCH youth, for both T1D and T2D, irrespective of the regimens used. Efforts to achieve better glycemic control for youth are urgently needed to reduce the risk of long-term complications.


Assuntos
Diabetes Mellitus Tipo 1/terapia , Diabetes Mellitus Tipo 2/terapia , Hemoglobinas Glicadas/análise , Adolescente , Criança , Diabetes Mellitus Tipo 1/sangue , Diabetes Mellitus Tipo 2/sangue , Feminino , Humanos , Índia , Masculino , Sistema de Registros , Resultado do Tratamento , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...